image reconstruction
- Europe > Germany > Lower Saxony > Gottingen (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- North America > United States > Texas > Harris County > Houston (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Information Technology (0.68)
A Appendix
KAN oversaw the project and contributed valuable feedback. MindEye was developed using a training and validation set of Subject 1's data, with the test set (and other subjects' data) untouched until final PyTorch code for the MLP backbone and projector is depicted in Algorithm 1. Specifics on how we DALL-E 2. This makes our prior much faster at inference time. For simplicity we use bidirectional attention in our final model. To map to Stable Diffusion's V AE latent space we use a low-level pipeline with the same architecture as the high level pipeline. Recent works in low-level vision (super-resolution, denoising, deblurring, etc.) have observed that This performs worse than only applying the loss in latent space and also requires significantly more GPU memory.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Massachusetts (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
Focus On What Matters: Separated Models For Visual-Based RL Generalization
A primary challenge for visual-based Reinforcement Learning (RL) is to generalize effectively across unseen environments. Although previous studies have explored different auxiliary tasks to enhance generalization, few adopt image reconstruction due to concerns about exacerbating overfitting to task-irrelevant features during training. Perceiving the pre-eminence of image reconstruction in representation learning, we propose SMG (\blue{S}eparated \blue{M}odels for \blue{G}eneralization), a novel approach that exploits image reconstruction for generalization. SMG introduces two model branches to extract task-relevant and task-irrelevant representations separately from visual observations via cooperatively reconstruction. Built upon this architecture, we further emphasize the importance of task-relevant features for generalization. Specifically, SMG incorporates two additional consistency losses to guide the agent's focus toward task-relevant areas across different scenarios, thereby achieving free from overfitting. Extensive experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings. Evaluations on robotic manipulation tasks further confirm the robustness of SMG in real-world applications.
Selective Masking based Self-Supervised Learning for Image Semantic Segmentation
This paper proposes a novel self-supervised learning method for semantic segmentation using selective masking image reconstruction as the pretraining task. Our proposed method replaces the random masking augmentation used in most masked image modelling pretraining methods. The proposed selective masking method selectively masks image patches with the highest reconstruction loss by breaking the image reconstruction pretraining into iterative steps to leverage the trained model's knowledge. We show on two general datasets (Pascal VOC and Cityscapes) and two weed segmentation datasets (Nassar 2020 and Sugarbeets 2016) that our proposed selective masking method outperforms the traditional random masking method and supervised ImageNet pretraining on downstream segmentation accuracy by 2.9% for general datasets and 2.5% for weed segmentation datasets. Furthermore, we found that our selective masking method significantly improves accuracy for the lowest-performing classes. Lastly, we show that using the same pretraining and downstream dataset yields the best result for low-budget self-supervised pretraining. Our proposed Selective Masking Image Reconstruction method provides an effective and practical solution to improve end-to-end semantic segmentation workflows, especially for scenarios that require limited model capacity to meet inference speed and computational resource requirements.
- North America > Canada > Saskatchewan > Saskatoon (0.04)
- North America > United States > North Dakota > Williams County (0.04)
- Transportation > Ground (0.46)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Tada-DIP: Input-adaptive Deep Image Prior for One-shot 3D Image Reconstruction
Bell, Evan, Liang, Shijun, Alkhouri, Ismail, Ravishankar, Saiprasad
Deep Image Prior (DIP) has recently emerged as a promising one-shot neural-network based image reconstruction method. However, DIP has seen limited application to 3D image reconstruction problems. In this work, we introduce Tada-DIP, a highly effective and fully 3D DIP method for solving 3D inverse problems. By combining input-adaptation and denoising regularization, Tada-DIP produces high-quality 3D reconstructions while avoiding the overfitting phenomenon that is common in DIP. Experiments on sparse-view X-ray computed tomography reconstruction validate the effectiveness of the proposed method, demonstrating that Tada-DIP produces much better reconstructions than training-data-free baselines and achieves reconstruction performance on par with a supervised network trained using a large dataset with fully-sampled volumes.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > Michigan > Ingham County > Lansing (0.04)
- North America > United States > Michigan > Ingham County > East Lansing (0.04)
Provenance-Driven Reliable Semantic Medical Image Vector Reconstruction via Lightweight Blockchain-Verified Latent Fingerprints
Rasheed, Mohsin, Al-Mamun, Abdullah
Medical imaging is essential for clinical diagnosis, yet real-world data frequently suffers from corruption, noise, and potential tampering, challenging the reliability of AI-assisted interpretation. Conventional reconstruction techniques prioritize pixel-level recovery and may produce visually plausible outputs while compromising anatomical fidelity, an issue that can directly impact clinical outcomes. We propose a semantic-aware medical image reconstruction framework that integrates high-level latent embeddings with a hybrid U-Net architecture to preserve clinically relevant structures during restoration. To ensure trust and accountability, we incorporate a lightweight blockchain-based provenance layer using scale-free graph design, enabling verifiable recording of each reconstruction event without imposing significant overhead. Extensive evaluation across multiple datasets and corruption types demonstrates improved structural consistency, restoration accuracy, and provenance integrity compared with existing approaches. By uniting semantic-guided reconstruction with secure traceability, our solution advances dependable AI for medical imaging, enhancing both diagnostic confidence and regulatory compliance in healthcare environments.
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- North America > Mexico > Gulf of Mexico (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.34)
VIVAT: Virtuous Improving VAE Training through Artifact Mitigation
Novitskiy, Lev, Vasilev, Viacheslav, Kovaleva, Maria, Arkhipkin, Vladimir, Dimitrov, Denis
Variational Autoencoders (VAEs) remain a cornerstone of generative computer vision, yet their training is often plagued by artifacts that degrade reconstruction and generation quality. This paper introduces VIVAT, a systematic approach to mitigating common artifacts in KL-VAE training without requiring radical architectural changes. We present a detailed taxonomy of five prevalent artifacts - color shift, grid patterns, blur, corner and droplet artifacts - and analyze their root causes. Through straightforward modifications, including adjustments to loss weights, padding strategies, and the integration of Spatially Conditional Normalization, we demonstrate significant improvements in VAE performance. Our method achieves state-of-the-art results in image reconstruction metrics (PSNR and SSIM) across multiple benchmarks and enhances text-to-image generation quality, as evidenced by superior CLIP scores. By preserving the simplicity of the KL-VAE framework while addressing its practical challenges, VIVAT offers actionable insights for researchers and practitioners aiming to optimize VAE training.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (5 more...)
- North America > United States (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East > Israel (0.05)
- North America > United States (0.04)
- North America > Canada (0.04)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.70)